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Abstract 

The entanglement-assisted classical capacity of a noisy quantum channel (Ce) 
is the amount of information per channel use that can be sent over the channel 
in the limit of many uses of the channel, assuming that the sender and receiver 
have access to the resource of shared quantum entanglement, which may be used 
up by the communication protocol. We show that the capacity Ce is given 
by an expression parallel to that for the capacity of a purely classical channel: 
i.e., the maximum, over channel inputs p, of the entropy of the channel input 
plus the entropy of the channel output minus their joint entropy, the latter be- 
ing defined as the entropy of an entangled purification of p after half of it has 
passed through the channel. We calculate entanglement-assisted capacities for 
two interesting quantum channels, the qubit amplitude damping channel and the 
bosonic channel with amplification/attenuation and Gaussian noise. We discuss 
how many independent parameters are required to completely characterize the 
asymptotic behavior of a general quantum channel, alone or in the presence of 
ancillary resources such as prior entanglement. In the classical analog of en- 
tanglement assisted communication — communication over a discrete memoryless 
channel (DMC) between parties who share prior random information — we show 
that one parameter is sufficient, i.e., that in the presence of prior shared random 
information, all DMC's of equal capacity can simulate one another with unit 
asymptotic efficiency. 



*IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA; f AT&T Labs - Research, 
Florham Park, NJ 07932, USA; *Dept. of Physics, U. C. Santa Barbara, Santa Barbara, CA 93106, 
USA. AVT acknowledges support from US Army Research Office under grant DAAG55-98-C0041 and 
DAAG55-98- 1-0366. Further AVT wishes to acknowledge support from IBM Research and David 
D. Awscahlom (UCSB). CHB and J AS acknowledge support from the National Security Agency 
and the Advanced Research and Development Activity through the U. S. Army Research Office, 
contract DAAG55-98-C-0041. The material in this paper was presented in part at the European 
Science Foundation Conference on Quantum Information: Theory, Experiment, and Perspectives, in 
Gdansk, Poland, July 2001. 



1 



I. Introduction 



The formula for the capacity of a classical channel was derived in 1948 by Shan- 
non. It has long been known that this formula is not directly applicable to channels 
with significant quantum effects. Extending this theorem to take quantum effects 
into account has been harder than might have been anticipated; despite much recent 
effort, we do not yet have a comprehensive theory for the capacity of quantum chan- 
nels. The book of Nielsen and Chuang [28] and the survey paper ||] are two sources 



giving good overviews of quantum information theory. In this paper, we advance 
quantum information theory by proving a capacity formula for quantum channels 
which holds when the sender and receiver have access to shared quantum entangled 
states which can be used in the communication protocol. We also present a conjec- 
ture that would imply that, in the presence of shared entanglement, to first order 
this entanglement-assisted capacity is the only quantity determining the asymptotic 
behavior of a quantum channel. 

A (memoryless) quantum communications channel can be viewed physically as 
a process wherein a quantum system interacts with an environment (which may be 
taken to initially be in a standard state) on its way from a sender to a receiver; it may 
be defined mathematically as a completely positive, trace-preserving linear map on 
density operators. The theory of quantum channels is richer and less well understood 
than that of classical channels. For example, quantum channels have several distinct 
capacities, depending on what one is trying to use them for, and what additional 
resources are brought into play. These include 

• The ordinary classical capacity C, defined as the maximum asymptotic rate at 
which classical bits can be transmitted reliably through the channel, with the 
help of a quantum encoder and decoder. 

• The ordinary quantum capacity Q, which is the maximum asymptotic rate at 
which qubits can be transmitted under similar circumstances. 

• The classically assisted quantum capacity Q2, which is the maximum asymp- 
totic rate of reliable qubit transmission with the help of unlimited use of a 2- way 
classical side channel between sender and receiver. 

• The entanglement assisted classical capacity Ce, which is the maximum asymp- 
totic rate of reliable bit transmission with the help of unlimited prior entangle- 
ment between the sender and receiver. 



Somewhat unexpectedly, the last of these has turned out to be the simplest to 
calculate, because, as we show in section II, it is given by an expression analogous 
to the formula expressing the classical capacity of a classical channel as the maxi- 
mum, over input distributions, of the input:output mutual information. Section III 
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Table 1: Capacities of several quantum channels. 



Channel 



Q Q2 



C C E 



Noiseless qubit channel 
50% erasure qubit channel 
2/3 depolarizing qubit channel 
Noiseless bit channel = 
100% dephasing qubit channel 



1 1 

1/2 







1 2 
1/2 1 

0.0817* 0.2075 
1 1 



Proved in 



calculates entanglement assisted capacities of the amplitude damping channel and of 
amplifying and attenuating bosonic channels with Gaussian noise. 

We return now to a general discussion of quantum channels and capacities, in 
order to provide motivation for section IV of the paper, on what we call the reverse 
Shannon theorem. 

Aside from the constraints Q <C < Cg, and Q < Q2, which are obvious conse- 
quences of the definitions, the four capacities appear to vary rather independently. It 
is conjectured that Q2 < C, but this has not been proved to date. Except in special 
cases, it is not possible, without knowing the parameters of a channel, to infer any 
one of its four capacities from the other three. This independence is illustrated in 
Table g, which compares the capacities of several simple channels for which they are 
known exactly. The channels incidentally illustrate four different degrees of qualita- 
tive quantumness: the first can carry qubits unassisted, the second requires classical 
assistance to do so, the third has no quantum capacity at all but still exhibits quan- 
tum behavior in that its capacity is increased by entanglement, while the fourth is 
completely classical, and so unaffected by entanglement. 

Contrary to an earlier conjecture of ours, we have found channels for which Q > 
but C = Ce- One example is a channel mapping three qubits to two qubits which 
is switched between two different behaviors by the first input qubit. The channel 
operates as follows: The first qubit is measured in the |0), |1) basis. If the result is 
|0), then the other two qubits are dephased (i.e., measured in the |0), |1) basis) and 
transmitted as classical bits; if the result is |1), the first qubit is transmitted intact 
and the second qubit is replaced by the completely mixed state. This channel has 
Q = Q2 = 1 (achieved by setting the first qubit to |1)) and C = Ce = 2. 

This complex situation naturally raises the question of how many independent 
parameters are needed to characterize the important asymptotic, capacity-like prop- 
erties of a general quantum channel. A full understanding of quantum channels 
would enable us to calculate not only their capacities, but more generally, for any 
two channels Ai and jV, the asymptotic efficiency (possibly zero) with which A4 can 
simulate jV, both alone and in the presence of ancillary resources such as classical 
communication or shared entanglement. 
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One motivation for studying communication in the presence of ancillary resources 
is that it can simplify the classification of channels' capacities to simulate one another. 
This is so because if a simulation is possible without the ancillary resource, then the 
simulation remains possible with it, though not necessarily vice versa. For example, 
Q and C represent a channel's asymptotic efficiencies of simulating, respectively, a 
noiseless qubit channel and a noiseless classical bit channel. In the absence of ancillary 
resources these two capacities can vary independently, subject to the constraint Q < 
C, but in the presence of unlimited prior shared entanglement, the relation between 
them becomes fixed: Ce = 2Qe, because shared entanglement allows a noiseless 
2-bit classical channel to simulate a noiseless 1-qubit channel and vice versa (via 
teleportation Q and superdense coding ||). 

We conjecture that prior entanglement so simplifies the complex landscape of 
quantum channels that only a single free parameter remains. Specifically, we conjec- 
ture that in the presence of unlimited prior entanglement, any two quantum channels 
of equal Ce could simulate one another with unit asymptotic efficiency. Section 
IV proves a classical analog of this conjecture, namely that in the presence of prior 
random information shared between sender and receiver, any two discrete memory- 
less classical channels (DMC's) of equal capacity can simulate one another with unit 
asymptotic efficiency. We call this the classical reverse Shannon theorem because it 
establishes the ability of a noiseless classical DMC to simulate noisy ones of equal 
capacity, whereas the ordinary Shannon theorem establishes that noisy DMC's can 
simulate noiseless ones of equal capacity. 

Another ancillary resource — classical communication — also simplifies the land- 
scape of quantum channels, but probably not so much. The presence of unlimited 
classical communication does allow certain otherwise inequivalent pairs of channels to 
simulate one another (for example, a noiseless qubit channel and a 50% erasure chan- 
nel on 4-dimensional Hilbert space), but it does not render all channels of equal Q2 



asymptotically equivalent. So-called bound-entangled channels 21, 15] have Q2 = 0, 
but unlike classical channels (which also have Q2 = 0) they can be used to prepare 
bound entangled states, which are entangled but cannot be used to prepare any pure 
entangled states. Because the distinction between bound entangled and unentan- 
gled states does not vanish asymptotically, even in the presence of unlimited classical 
communication f32]| , bound-entangled and classical channels must be asymptotically 
inequivalent, despite having the same Q2- 

The various capacities of a quantum channel M may be defined within a common 
framework, 

C X {N) = limlimsup {- : 3 A ^B^r m F(?p,A,B,J\f) > 1-e }. (1) 

Here Cx is a generalized capacity; A is an encoding subprotocol, to be performed 
by Alice, which receives an m-qubit state 1/1 belonging to some set T m of allowable 
inputs to the entire protocol, and produces n possibly entangled inputs to the channel 
J\f; B is a decoding subprotocol, to be performed by Bob, which receives n (possibly 
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entangled) channel outputs and produces an m-qubit output for the entire protocol; 
finally F(ip,A,B,ftf) is the fidelity of this output relative to the input ip, i.e., the 
probability that the output state would pass a test determining whether it is equal 
to the input (more generally, the fidelity of one mixed state p relative to another a is 
F = (tr( y /p a^fp)) 2 ). Different capacities are defined depending on the specification 
of r, A and B. The classical capacities C and Ce are defined by restricting ip to a 
standard orthonormal set of states, without loss of generality the "Boolean" states 
labelled by bit strings T m = {|0), |l)}® m ; for the quantum capacities Q and Q2, T m 
is the entire 2 m dimensional Hilbert space Hf™ 1 - For the simple capacities Q and C, 
the Alice and Bob subprotocols are completely-positive trace-preserving maps from 
Hf m to the input space of N® n , and from the output space of AA® n back to Hf m . 
For Ce and Q2, the subprotocols are more complicated, in the first case drawing on 
a supply of ebits (maximally entangled pairs of qubits) shared beforehand between 
Alice and Bob, and in the latter case making use of a 2-way classical channel between 
Alice and Bob. The definition of Q2 thus includes interactive protocols, in which the 
n channel uses do not take place all at once, but may be interspersed with rounds of 
classical communication. 

The classical capacity of a classical discrete memoryless channel is also given by 
an expression of the same form, with ip restricted to Boolean values; the encoder 
A, decoder B, and channel J\f all being restricted to be classical stochastic maps; 
and the fidelity F being defined as the probability that the (Boolean) output of 
B(ftf® n (A(ip))) is equal to the input tp. We will sometimes indicate these restrictions 
implicitly by using upper case italic letters (e.g. N) for classical stochastic maps, and 
lower case italic letters (e.g. x) for classical discrete data. The definition of classical 
capacity would then be 

Tfl 

C(7V) = Iimlimsup{-:3A3 B V a;e{ o ) i } m J F(s,A,B,JV) > 1-e }. (2) 

A classical stochastic map, or classical channel, may be defined in quantum terms 
as one that is completely dephasing in the Boolean basis both with regard to its 
inputs and its outputs. A channel, in other words, is classical if and only if it can be 
represented as a composition 

N = v'gv (3) 

of the completely dephasing channel V on the input Hilbert space, followed by a 
general quantum channel Q, followed by the completely dephasing channel V on the 
output Hilbert space (a completely dephasing channel is one that makes a von Neu- 
mann measurement in the Boolean basis and resends the result of the measurement). 
Dephasing only the inputs, or only the outputs, is in general insufficient to abolish 
all quantum properties of a quantum channel Q. 

The notion of capacity may be further generalized to define a capacity of one 
channel N to simulate another channel M . This may be defined as 
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Figure 1: A quantum system Q in mixed state p is sent through the noisy channel M, 
which may be viewed as a unitary interaction U with an environment E. Meanwhile 
a purifying reference system R is sent through the identity channel X. The final joint 
state of RQ has the same entropy as the final state £{p) of the environment. 



C X (N,M) = limlimsup{— : 3_4, B V »™ F (M® m (ip) , A, B , Af) > 1-e }, (4) 

e— >0 n—>oo n M 

where A and B are respectively Alice's and Bob's subprotocols which together en- 
able Alice to receive an input tp in H®™ (the tensor product of m copies of the 
input Hilbert space Hm of the channel A4 to be simulated) and, making n forward 
uses of the simulating channel Af, allow Bob to produce some output state, and 
F (Ai® m (ip) , A, B,Af) is the fidelity of this output state with respect to the state that 
would have been generated by sending the input ifj through AA.® m . 

These definitions of capacity are all asymptotic, depending on the properties of 
Af® n in the limit n^oo. However, several of the capacities are given by, or closely 
related to, non- asymptotic expressions involving input and output entropies for a 
single use of the channel. Figure 1 shows a scenario in which a quantum system 
Q, initially in mixed state p, is sent through the channel, emerging in a mixed state 
Af(p). It is useful to think of the initial mixed state as being part of an entangled pure 
state ^jj where R is some reference system that is never operated upon physically. 
Similarly the channel can be thought of as a unitary interaction U between the 
quantum system Q and some environment subsystem E, which is initially supplied in 
a standard pure state E , and leaves the interaction in a mixed state £{p) E . Thus M 
and £ are completely positive maps relating the final states of the channel output and 
environment, respectively, to the initial state of the channel input, when the initial 
state of the environment is held fixed. The mnemonic superscripts Q, R, E indicate, 
when necessary, to what system a density operator refers. 

Under these circumstances three useful von Neumann entropies may be defined, 
the input entropy 

H(p Q ) =-trp Q log 2 p Q , 

the output entropy 

H(N{ P )% 
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and the entropy exchange 



H((M®l)^ R ) = H{8{p) E ). 

The complicated left side of the last equation represents the entropy of the joint 
state of the subsystem Q which has been through the channel, and the reference 
system R, which has not, but may still be more or less entangled with it. The 
density operator (J\f®T)<E> p is the quantum analog of a joint input:output probability 
distribution, because it has M(p) and p as its partial traces. Without the reference 
system, the notion of a joint input:output mixed state would be problematic, because 
the input and output are not present at the same time, and the no-cloning theorem 
prevents Alice from retaining a spare copy of the input to be compared with the one 
sent through the channel. The entropy exchange is also equal to the final entropy of 
the environment H(£(p)), because the tripartite system QRE remains throughout in 
a pure state; making its two complementary subsystems E and QR always isospectral. 
The relations between these entropies and quantum channels have been well reviewed 



by Schumacher and Nielsen [g0[| and by Holevo and Werner [18]. 

By Shannon's theorem, the capacity of a classical channel N is the maximum, 
over input distributions, of the inputioutput mutual information, in other words the 
input entropy plus the output entropy less the joint entropy of input and output. The 
quantum generalization of mutual information for a bipartite mixed state p AB , which 
reduces to classical mutual information when p AB is diagonal in a product basis of 
the two subsystems, is 

H(p A )+H(p B )-H(p AB ). 

where 

p A =t TB p AB and p B = ti A p AB . 

In terms of Figure [l], the classical capacity of a classical channel (cf. eq. (^) ) can be 
expressed as 

C(N) = max H(p) + H(N{p)) - H((N ® I)(* p )) (5) 
peA 

where A is the class of density operators on the channel's input Hilbert space that 
are diagonal in the Boolean basis. The third term (entropy exchange), for a classical 
channel N, is just the joint Shannon entropy of the classically correlated Boolean 
input and output, because the von Neumann entropies reduce to Shannon entropies 
when evaluated in the Schmidt basis of <3?p, with respect to which all states are 
diagonal. The restriction to classical inputs p E A can be removed, because any non- 
diagonal elements in p would only reduce the first term, while leaving the other two 
terms unchanged, by virtue of the diagonality-enforcing properties of the channel. 
Thus, the expression 

maxH(p) + H(N(p))- H((N®1)$ P ), (6) 
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is a natural generalization to quantum channels M of a classical channel's maximal 
input:output mutual information, and it is equal to the classical capacity whenever 
N is classical, as defined previously in this section. 

One might hope that this expression continues to give the classical capacity of a 
general quantum channel M, but that is not so, as can be seen by considering the 
simple case M = 1 of a noiseless qubit channel. Here the maximum is attained on a 
uniform input mixed state p = 1/2, causing the first two terms each to have the value 
1 bit, while the last term is zero, giving a total of 2 bits. This is not the ordinary 
classical capacity of the noiseless qubit channel, which is equal to 1 bit, but rather 
its entanglement-assisted capacity Ce(N)- In the next section we show that this is 
true of quantum channels in general, as stated by the following theorem. 

Theorem 1 Given a quantum channel N , then the entanglement- assisted capacity 
of the quantum channel Ce is equal to the maximal quantum mutual information 

C E = m&xH(p) + H(N(p))-H({N®l)<S> p ). (7) 

Here the capacity Ce is defined as the supremum of Eq. ^) when ip ranges over 
Boolean states and A, B over all protocols where Alice and Bob start with an arbi- 
trarily large number of shared EPR pairs^ but have no access to any communication 
channels other than N '. 



Another capacity theorem which has been proven for quantum channels is the 
Holevo-Schumacher- Westmoreland theorem [19, 31], which says that if the signals 
that Bob receives are constrained to lie in a set of quantum states p[, where Alice 
chooses i (for example, by supplying input state pi to the channel M) then the capacity 
is given by 

C H ({p{}) = H(Y / PiP , d-Y,Pi H (Pi)- (8) 

i i 

This gives a means to calculate a constrained classical capacity for a quantum chan- 
nel M if the sender is not allowed to use entangled inputs: the channel's Holevo 
capacity Ch{N) being defined as the maximum of C#({AA(/9j)}) over all possible sets 
of input states {p{\. We will be using this theorem extensively in the proof of our 
entanglement-assisted capacity bound. 

In our original paper Q, we proved the formula (||) for certain special cases, in- 
cluding the depolarizing channel and the erasure channel. We did this by sandwiching 
the entanglement-assisted capacity between two other capacities, which for certain 
channels turned out to be equal. The higher of these two capacities we called the for- 
ward classical communication cost via teleportation, (FCCCt p ), which is the amount 

1 It is sufficient to use standard EPR pairs — maximally entangled two-qubit states — as the en- 
tanglement resource because any other entangled state can be efficiently prepared from EPR pairs 
by the process of entanglement dilution using an asymptotically negligible o(n) amount of forward 
classical communication 1271. 
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of forward classical communication needed to simulate the channel M by teleporting 
over a noisy classical channel. The lower of these two bounds we called Csdi which 
is the capacity obtained by using the noisy quantum channel M in the superdense 
coding protocol. We have that C Sd < C E < FCCC Tp . Thus, if C Sd = FCCC Tp for 
a channel, we have obtained the entanglement-assisted capacity of the channel. In 
order for this argument to work, we needed the classical reverse Shannon theorem, 
which says that a noisy classical channel can be simulated by a noiseless classical 
channel of the same capacity, as long as the sender and receiver have access to shared 
random bits. We needed this theorem because the causality argument showing that 
EPR pairs do not add to the capacity of a classical channel appears to work only for 
noiseless channels. We sketched the proof of the classical reverse Shannon theorem 
in our previous paper, and give it in full in this paper. 

In our previous paper, the bounds Csd and FCCCt p are both computed using 
single-symbol protocols; that is, both the superdense coding protocol and the simula- 
tion of the channel by teleportation via a noisy classical channel are carried out with 
a single use of the channel. The capacity is then obtained using the classical Shan- 
non formula for a classical channel associated with these protocols. In this paper, we 
obtain bounds using multiple-symbol protocols, which perform entangled operations 
on many uses of the channel. We then perform the capacity computations using the 
Holevo-Schumacher- Westmoreland formula (|8|). 

II. Formula for Entanglement Assisted Classical Capac- 
ity 

Assume we have a quantum channel M which maps a Hilbert space 7i\ n to another 
Hilbert space W ut- Let Ce be the classical capacity of the channel when the sender 
and receiver have an unbounded supply of EPR pairs to use in the communication 
protocol. This section proves that the entanglement-assisted capacity of a channel is 
the maximum quantum mutual information attainable between the two parts of an 
entangled quantum state, one part of which has been passed through the channel. 
That is, 

C E {M) = max H(p) + H{M{p)) - H{{M (9) 
pel-Cm 

where H (p) denotes the von Neumann entropy of a density matrix p G TL m , H(Af(p)) 
denotes the von Neumann entropy of the output when p is input into the channel, 
and H((J\f <g>I)$ p ) denotes the von Neumann entropy of a purification & p of p over a 
reference system W re f , half of which (Hi n ) has been sent through the channel M while 
the other half (TC Te f) has been sent through the identity channel X (this corresponds to 
the portion of the entangled state that Bob holds at the start of the protocol). Here, 
we have <& p S Wi n ®W re f and Tr re f<l?p = p. All purifications of p give the same entropy 
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in this formula^, so we need not specify which one we use. As pointed out earlier, the 
right hand side of Eq. @ parallels the expression for capacity of a classical channel 
as the maximum, over input distributions, of the input :output mutual information. 
Lindblad pq| , Barnum et al. Q, and Adami and Cerf [12] characterized several 



important properties of the quantum mutual information, including positivity, addi- 
tivity and the data processing inequality. Adami and Cerf argued that the right side 
of Eq. (||) represents an important channel property, calling it the channel's "von 
Neumann capacity" , but they did not indicate what kind of communication task this 
capacity represented the channel's asymptotic efficiency for doing. Now we know that 
it is the channel's efficiency for transmitting classical information when the sender 
and receiver share prior entanglement. 

In our demonstration that Eq. (|9|) is indeed the correct expression for entangle- 
ment assisted classical capacity, the first subsection gives an entanglement assisted 
classical communication protocol which can asymptotically achieve the rate RHS — e 
for any e. The second subsection gives a proof of a crucial lemma on typical subspaces 
needed in the first subsection. The third subsection shows that the right hand side of 
Eq. @ is indeed an upper bound for Ce(J\T). The fourth subsection proves several 
entropy inequalities that are used in the third subsection. 

A. Proof of the Lower Bound 

In this section, we will prove the inequality 

C E {M) > max H{p) + H(Af(p)) - H{N®1 (*,)). (10) 

We first show the inequality 

C E {M) > H(p) + H(N(p)) - H(mi ($„)) (11) 

for the special case where p = 4/, where d = d\m.7i in , I is the identity matrix, and 
<3?p is a maximally entangled state. We then use this special case to show that the 
inequality (|ll]) still holds when p is any projection matrix. We finally use the case 
where p is a projection matrix to prove the inequality in the general case of arbitrary 
p, showing (|l0|); we do this by taking p' to be the projection onto the typical subspace 
of p® n , and using p' and M® n in the inequality (|TT1) . 

The coding protocol we use for the special case given above, where p = is 
essentially the same as the protocol used for quantum superdense coding |J, which 
procedure yields the entanglement-assisted capacity in the case of a noiseless quantum 



channel. The proof that the formula (11) holds for p = I/d, however, is quite different 
from and somewhat more complicated than the proof that superdense coding works. 
Our proof uses Holevo's formula (0) for quantum capacity to compute the capacity 



2 This is a consequence of the fact that any two purifications of a given density matrix can be 
mapped to each other by a unitary transformation of the reference system 0] . 



10 



achieved by our protocol. This protocol is the same as that given in our earlier paper 
on Ce 0, although our proof is different; the earlier proof only applied to certain 
quantum channels, such as those that commute with teleportation. 

We need to use the generalization of the Pauli matrices to d dimensions. These 
are the matrices used in the ci-dimensional quantum teleportation scheme ||. There 
are d 2 of these matrices, which are given by = T 3 R , for the matrices T and R 
defined by their entries as 

T a , b = 5„,6-imodd and R a ,b = e 2nia / d 5 a , b (12) 



as in II . To achieve the capacity given by the above formula (11) with p = I/d, 
Alice and Bob start by sharing a d-dimensional maximally entangled state 4>. Alice 
applies one of the d 2 transformations Uj^ to her part of 0, and then sends it through 
the channel M. Bob gets one of the d 2 quantum states (M (g) I)(Uj ! k <8> It is 

straightforward to show that averaging over the matrices Ujk effectively disentangles 
Alice's and Bob's pieces, so we obtain 

d 

Y,(N®l)(U j>k ®l)<f> = AA(TW)®Tr A 
j,k=i 

= N{p)®p (13) 
where p = The entropy of this quantity is the first term of Holevo's formula and 



gives the first two terms of (11). The entropy of each of the d states {N®I){Uj : k®1)<t> 

is H({N ®T){<&p)), since each of the {Uj : k®1){4>) ls a purification of p. This entropy 

is the second term of Holevo's formula ||, and gives the third term of (|TT|) . We thus 

obtain the formula when p = 

The next step is to note that the inequality (|ll|) also holds if the density matrix p 

is a projection onto any subspace of 7i m . The proof is exactly the same as for p = 

In fact, one can prove this case by using the above result. By restricting 7ii n to the 

support of p, which we can denote by H! , and by restricting M to act only on 7i' , we 

obtain a channel J\f' for which p' = -^-I- 

. . in 

We now must show that (11) holds for arbitrary p. This is the most difficult part 



of the proof. For this step we need a little more notation. Recall that we can assume 
that any quantum map N can be implemented via a unitary transformation IA acting 
on the system TL m and some environment system Ti. env , where TL em starts in some 
fixed initial state. We introduce £, which is the completely positive map taking 7i Ui 
to Tienv by first applying U and tracing out everything but Ti. e nv We then have 

H{£{p))=H{{N®l)® p ) (14) 

where p is a density matrix over 7i in and § p is a purification of p. Recall (from 
footnote 2) that this does not depend on which purification <& p of p is used. 
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As our argument involves typical subspaces, we first give some facts about typical 
subspaces. For technical reasons^] we use frequency-typical subspaces. For any e and 
5 there is a large enough n such the Hilbert space 'H® n contains a typical subspace 
T (which is the span of typical eigenvectors of p) such that 

1. Trn T p®"n T > 1 - e, 

2. The eigenvalues A of 11^ p® n ILr satisfy 

2 -n{H(p)+6) < A < 2 -n(H(p)-5) ^ 

3. (1 - e )2 n ( H W~^ < dimT < 2 n (^ +<5) . 

Let T n C "H®" be the typical subspace corresponding to p® n , and let ttt h be the 
normalized density matrix proportional to the projection onto T n . It follows from 
well-known facts about typical subspaces that 

lim ±.H(* Tn ) = H(p). 

n — >oo fi 

We can also show the following lemma. We delay giving the proof of this lemma until 
after the proof of the theorem. 

Lemma 1 Let M be a noisy quantum channel and p a density matrix on the input 
space of this channel. Then we can find a sequence of frequency typical subspaces T n 
corresponding to p® n , such that if ttt„ is the unit trace density matrix proportional 
to the projection onto T n , then 

7 hm±H(N® n (KT n )) = H(N(p)). (15) 

Applying the lemma to the map onto the environment similarly gives 

lim -H(S^ Tn )) = H(S(p)). (16) 

n — >oo fi 

Thus, if we consider the quantity 

i [F(vr T J + (A^>tJ) - H(S > T J)] (17) 
we see that it converges to 

H(p) + H(M(p)) - H(£(p)), (18) 

3 Our proof of Lemma [l] does not appear to work for entropy- typical subspaces unless these sub- 
spaces are modified by imposing a somewhat unnatural-looking extra condition. This will be dis- 
cussed later. 
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which the identity ( ]14| ) shows is equal to the desired quantity (]9|) . This concludes the 
proof of the lower bound. 

One more matter to be cleared up is the form of the prior entanglement to be 
shared by Alice and Bob. The most standard form of entanglement is maximally 
entangled pairs of qubits ("ebits"), and it is natural to use them as the entanglement 
resource in defining Ce- However, Eq. (g) involves the entangled state <& p , which is 
typically not a product of ebits. This is no problem, because, as Lo and Popescu p7fl 
showed, many copies of two entangled pure states having an equal entropy of entan- 
glement can be interconverted not only with unit asymptotic efficiency, but in a way 
that requires an asymptotically negligible amount of (one-way) classical communica- 
tion, compared to the amount of entanglement processed. Thus the definition of Ce is 
independent of the form of the entanglement resource, so long as it is a pure state. As 
it turns out, the lower bound proof does not actually require construction of <3? p itself, 
but merely a sequence of maximally entangled states on high-dimensional typical sub- 
spaces T n of tensor powers of § p . These maximally entangled states can be prepared 
from standard ebits with arbitrarily high fidelity and no classical communication ||. 

B. Proof of Lemma [j] 

In this section, we prove 

Lemma 1 Suppose p is a density matrix over a Hilbert space Ti of dimension d, and 
N ' , £, are two trace-preserving completely positive maps. Then there is a sequence of 
frequency-typical subspaces T n C 7i® n corresponding to p® n such that 

lim -dimT n = Hip), (19) 

n — >oo fi 

lim Itf(A^>Tj) = H(N(p)), (20) 

n — >oo n 

and 

hm-H(£^(7T Tn )) = H(£(p)), (21) 

n — >oo fi 

where itT n ^ s the projection matrix onto T n normalized to have trace 1. 



For simplicity, we will prove this lemma with only the conditions fll9|) and (20). 
Altering the proof to also obtain the condition ( |2~l| ) is straightforward, as we treat 
the map £ in exactly the same manner as the map M, and need only make sure that 
both formulas @ and @ converge. 

Our proof is based on several previous results in quantum information theory. For 
the proof of the < direction in Eq. (p0|), we show that a source producing states with 
average density matrix N® n (-KT n ) can be compressed into nH(N{p)) + o{n) qubits 
per state, with the property that the original source output can be recovered with 



high fidelity. Schumacher's theorem [23, 29 1 shows that the dimension needed for 



asymptotically faithful encoding of a quantum source is equal to the entropy of the 
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density matrix of the source; this gives the upper bound on H (ftf® n (7Tt„)) For the 
proof of the > direction of Eq. (f20|) , we need the theorem of Hausladen et al. ]l7j that 
the classical capacity of signals transmitting pure quantum states is the entropy of 
the density matrix of the average state transmitted (this is a special case of Holevo's 
formula (|8|)). We give a communication protocol which transmits a classical message 
containing nH(N{p)) — o(n) bits using pure states. By applying the theorem of 
Hausladen et al. to this communication protocol, we deduce a lower bound on the 
entropy Af® n (ir Tn ). 

Proof: We first need some notation. Let the eigenvalues and eigenvectors of p be Xj 
and \vj), with 1 < j < d. Let the noisy channel M map a (i-dimensional space to a 
^out-dimensional space. Choose a Krauss representation for jV, so that 



N{a) = Y j A k aA\, 



k=l 



where c < d and J2t=i AkA k = I- Then we have 



d c 



j=ifc=i 



We let 



and 



\ u j,k) 



1 



Ak\vj) 



Ak\vj) 



A k \vj) 



(22) 

(23) 
(24) 



so that 

d c 

N{p) = x jPjM u j,k)(uj,k\ ■ 

j=l fe=l 

We need notation for the eigenstates and eigenvalues of M(p). Let these be \wk) 
and ujk, 1 < k < d out . Finally, we define the probability pjk, 1 < j < d, 1 < k < d out , 
by 

Pjk = (u) k \N(\vj)(vj\) \w k ). (25) 

This is the probability that if the eigenstate \vj) of p is sent through the channel M 
and measured in the eigenbasis of M{p), that the eigenstate \wk) will be observed. 
Note that 



Wk) 



(w k \Af(p) \ w k ) 



(26) 
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We now define the typical subspace T ny s lP - Most previous papers on quantum 
information theory have dealt with entropy typical subspaces. We use frequency 
typical subspaces, which are similar, but have properties that make the proof of this 
lemma somewhat simpler. 

A frequency typical subspace of "H® n associated with the density matrix pG Wis 
defined as the subspace spanned by certain eigenstates of p® n . We assume that p has 
all positive eigenvalues. (If it has some zero eigenvalues, we restrict to the support 
of p, and find the corresponding typical subspace of supp(p)® n , which will now have 
all positive eigenvalues.) The eigenstates of p® n are tensor product sequences of 
eigenvectors of p, that is, \v ai ) (g) \v a2 ) (g> . . . (g> \v Qn ). Let \s) be one of these eigenstates 
of p® n . We will say \s) is frequency typical if each eigenvector \vj) appears in the 
sequence \s) approximately n\j times. Specifically, an eigenstate \s) is 5-typical if 

N {Vj) (\s)) - X jn \ < 5n (27) 

for all j; here N|„.)(|s)) is the number of times that \vj) appears in \s). The fre- 
quency typical subspace T n s :P is the subspace of TC® n that is spanned by all 5-typical 
eigenvectors \s) of p® n . 

We define Hr to be the projection onto the subspace T, and ttt to be this pro- 
jection normalized to have trace 1, that is, ttt = dimr -Hr- 

From the theory of typical sequences [14], for any density matrix a, any e > 



and 5 > 0, one can choose n large enough so that 

1. Tr n T . a® n Il T , > 1-e. 

2. The eigenvalues A of Hr * &® n Hr * satisfy 

2 -n(H(a)+5>) < \ < 2 -n(H(*)-8>) 

where 5' = 5dlog(A max /A m i n ), and A max (A max ) is the maximum (minimum) 
eigenvalue of a. 

3. (1 - e )2"("M- 5 ') < dimT nM < 2 n{ - H ^ +5 '\ 

The property (1) follows from the law of large numbers, and (2), (3) are straightfor- 
ward consequences of (1) and the definition of typical subspace. 

We first prove an upper bound that for all 5\, and for sufficiently large n, 

±H(M® n (ir Tn>Sl J) < H{N(p)) + C6t . (28) 

for some constant C . We will do this by showing that for any e, there is an n 
sufficiently large such that we can take a typical subspace T mn ^ 2) j\f( p ) in TL®^ t and 
project m signals from a source with density matrix A/"® n (7ry n s p ) onto it, such that 
the projection has fidelity 1 — e with the original output of the source. Here, 82 (and 
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#3, Si) will be a linear function of 6\ (with the constant depending on <r, M). By- 
projecting the source on T mn ^ 2t _\f(p), we are performing Schumacher compression of 
the source. Prom the theorem on possible rates for Schumacher compression (quantum 



source coding) [£3], 29], this implies that 



H(N® n (T nA>p )) < lim -logdimT„ Wi(52iA f (p) . (29) 



m — >oo ffi 

The property (3) above for typical subspaces then implies the result. 
Consider the following process. Take a typical eigenstate 

\s) = \v ai ) (g> \v a2 ) (8) ... ® \v an ) 

of T n> s }P . Now, apply a Krauss element Ak to each symbol \v a .) of \s), with element 
applied with probability |^4fcl w aj)| 2 - This takes 

n 

k) = 0ls) (so) 

3=1 

to one of c n possible states \t). Each state is associated with a probability of reaching 
it; in particular, the state 

n 

|*> = (g)K >ft ) (31) 

3=1 

is produced with probability 

n 

T=Y[»a j ,P j . (32) 
3=1 

Notice that, for any |s), if the |t 2 ) and t z are defined as in Eqs. pi] ) and (|32|), 
then 

A^(|s><s|) = i>iag, (33) 

2=1 

where the sum is over all \t) in Eq. (|3l|). 

We will now see what happens when \t z ) is projected onto a typical subspace 
T n ,s 2 ,Af(p) associated with J\f(p)® n . We get that the fidelity of this projection is 

(t*\ u T nthtMM \t z ) = £ <r|t*)<t*|r>, (34) 

where the sum is taken over all 52-typical eigenstates |r) of j\f(p)® n ■ Now, we compute 
the average fidelity (using the probability distribution r) over all states \t z ) produced 
from a given ^i-typical eigenstate \s) = ®j\v aj ): 

^2r z (t z \U Tn52 ^ p) \t z ) = 12 T z {r\t z ){t z \r) 
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E (r\M m (\s)(s\)\r) 

n 

= E II /V..-.., (35) 

Here the last step is an application of Eq. ( |25| ) . The above quantity has a completely 
classical interpretation; it is the probability that if we start with the (^-typical se- 
quence \s) = ®\v aj ), and take \v a ) to \w 7 ) with probability p ai , that we end up with 
a (^-typical sequence of the \w^). 

We will now show that the projection onto T n ^2j^f{p) °f the- average state \t z ) 
generated from a <5i-typical eigenstate \ s) of p® n has expected trace at least 1 — e. This 
will be needed for the lower bound, and a similar result, using the same calculations, 
will be used for the upper bound. We know that the original sequence \s) is 8\- 
typical, that is, each of the eigenvectors \vj) appears approximately nXj times. Now, 
the process of first applying A}~ to each of the symbols, and then projecting the result 
onto the eigenvectors of N(p)® mn , takes \vj) to \wk) with probability pjk- We start 
with a 5i -typical sequence \s), so we have 

N K >(|s)) = (A J +A J )mn (36) 

where |Aj| < Si. Taking the state \s) = 0^ \vj) to |r) = ® fc \ wk), and using Eq. (f26"l), 
we get 

E ( N K)(l r >)) = (^k + E A jPjk)mn 

j 

= {u k + A' k )mn (37) 

where A' k < d5\. The quantity Ni tofc \ (|r)) is determined by the sum of ran independent 
random variables whose values are either or 1. Let the expected average of these 
variables be /u,k = LOk + A' k . Chernoff's bound |Q] says that for such a variable X 
which is the sum of N independent trials, and ijlN is the expected value of X, 

Pr[X - /iiV < -a] < e~ 2a2/N , 
Pi[X-fiN>a] < e' 2a2/N . 

Together, these bounds show that 

Pr[|N K) (|r)) - (w fc + A' k )mn\ < 5mn] < 2 e - 2<52 " m (38) 

If we take 62 = (d + l)<$i, then by Chernoff's bound, for every e there are sufficiently 
large mn so that |r) is ^-typical with probability 1 — e. 

Now, we are ready to complete the upper bound argument. We will be using the 
theorem about Schumacher compression |2^, ^] that if, for all sufficiently large m, 
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we can compress m states from a memoryless source emitting an ensemble of pure 
states with density matrix a onto a Hilbert space of dimension mH , and recover them 
with fidelity 1 - e, then H(a) < H. 

We first need to specify a source with density matrix N® n (TTT n s p ). Taking a 
random <5i-typical eigenstate \s) of p® n (chosen uniformly from all 5\ -typical eigen- 
states) , and premultiplying each of the tensor factors | v aj ) by with the probabil- 
ity l-Afcluo,.)! to obtain a vector \t), gives us the desired source with density matrix 
N® n (iTT nSl p ). We next project a sequence of m outputs from this source onto the 
typical subspace T mn ^ 2 ^^ p y Let us analyze this process. First, we will specify a 
sequence \s) of m particular <5i-typical eigenstates \s) = \s\)\s2) ■ ■ ■ \s m ) . Because 
each of the components \si) of this state \s) is <5i-typical, \s) is a 8\ -typical eigenstate 
of p® mn . Consider the ensemble of states \t) generated from any particular 5\ -typical 
|s) by applying the Ak matrices to \s). It suffices to show that this ensemble can be 
projected onto T mn g 2 ^ff p \ with fidelity 1 — e; that is, that 

£ (s\ ® fc A\n Tnih)P ® k A k \s) > 1 - e. (39) 

k 

This will prove the theorem, as by averaging over all <5i-typical states \s) we obtain a 
source with density matrix N® n {iiT n Sl p ) whose projection has average fidelity 1 — e. 
This implies, via the theorems on Schumacher compression, that 

m m ^T n>s J) < ^idimT^^ 

< n(H(M(p)) + S 3 ) (40) 

where ^3 = ^dout log(w m ax/w m i n ); here u; max (o; m i n ) is the maximum (minimum) 
non-zero eigenvalue of M(p). If we let 5i go to as n goes to 00, we obtain the 
desired bound. For this argument to work, we need to make sure that e is bounded 
independently of |s); this follows from the Chernoff bound. 

We need now only show that the projection of the states \t) generated from 
pi/ ■ ' ■ % 

) onto the typical subspace T mnj s 2 j^{ p ) has trace at least 1 — e. We know 
that the original sequence \s) is 5\ -typical, that is, each of the eigenvectors \vj) ap- 
pears approximately mn\ times. Thus, the same argument using the law of large 
numbers that applied to Eq. ( |35| ) also holds here, and we have shown the upper bound 
for Lemma |l]. 

We now give the proof of the lower bound. We use the same notation and some 
of the same ideas and machinery as in our proof of the upper bound. Consider 
the distribution of \t z ) obtained by first picking a random typical eigenstate \s) of 
p® n , and applying a matrix A^ to each symbol of \s), with A^ applied to \vj) with 
probability l^fclvj) | 2 . This gives an ensemble of quantum states \t z ) with associated 
probabilities t z such that 

^ n (^ Sl J = j2r z \t z )(t z \. (41) 
2=1 
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The idea for the lower bound is to choose randomly a set T of size W = n(H(p) — 64) 
from the vectors \t z ), according to the probability distribution t z . We take 84 = C5\ 
for some constant C to be determined later. We will show that with high probability 
(say, 1 — 62) the selected set T of \t z ) vectors satisfy the criteria of Hausladen et al 
p7[ for having a decoding observable that correctly identifies a state \t z ) selected at 
random with probability 1 — e. This means that these states can be used to send 
messages with rate n(H(p) — 64) (1— 2e), showing that the density matrix of their equal 
mixture ttt = tjx J2zeT l*z)(*z| nas entropy at least n(H(p) — 64)) (1 — 2e). However, 
the weighted average of these density matrices ttt over all sets T is N® n {^T n s p ) = 
J2 Z T z\t z )(t z \, where each ttt is weighted according to its probability of appearing. By 
concavity of von Neumann entropy, H{J\f® n (i:T n s p )) > n(H(p) — #4)(1 — 2e)(l — €2)- 
By amking n sufficiently large, we can make e, £2, and £4 arbitrarily small, and so we 
are done. 

The remaining step is to give the proof that with high probability a randomly 
chosen set of size W of the \ t z ) obeys the criterion of Hausladen et al. The Hausladen 
et al. protocol for decoding [jlTj is first to project onto a subspace, for which we will 
use the typical subspace T n ^ 2 ^_\f( p ), and then use the square root measurement on the 
projected vectors. Here, the square root measurement corresponding to vectors \vi), 
^2), • • • is the POVM with elements 

r 1/2 k>Hr 1/2 

where 

= \ v i)(Vi\. 

i 

Here, we use \vi) = TlT n Sw M ,Mi). Hausladen et al. [0] give a criterion for the 
projection onto a subspace followed by the square root measurement to correctly 
identify a state chosen at random from the states \t z ) G T. Their theorem only gives 
the expected probability of error, but the proof can easily be modified to show that 
the probability of error Pe^ in decoding the i'th vector, |tj), is at most 

P E ,i< 2(1 - So) + J2 SijSji, (42) 

where S u = {Ul^T^^U) and S y = {U\U T?>S2tmp) \tj). 

We have already shown that the expectation of the first term of (|4"2|), 1 — Su, is 
small, for \ti) obtained from any typical eigenstate \s) of p® n . We need to give an 
estimate for the second term of (p2|). Taking expectations over all the \t z ), z 7^ i, we 
obtain, since all the \t z ) are chosen independently, 

E(£%*%) =(W-1) Er z |(t 4 |H T ^ X(p) |t z )| 2 (43) 
j^i 2=1 
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where W is the number of random codewords \t z ) we choose randomly. We now con- 
sider a different probability distribution on the \ t z ), which we call r' z . This distribution 
is obtained by first choosing an eigenstate \s) of p® n with probability proportional to 
its eigenvalue (rather than choosing uniformly among 5- typical eigenstates of p® n ), 
and then applying a Krauss element to each of its symbols to obtain a word \t) (as 
before, is applied to \vj) with probability 1^1^) | 2 ). Observe that t z < 2 2S ' u t' z , 
where 5' = (Wlog(A max /A m i n ). This holds because the difference between the two dis- 
tributions t and t' stems from the probability with which an eigenstate \s) of p® n is 
chosen; from the properties of typical subspaces, the eigenvalue of every typical eigen- 
state \s) of p® n is no more than 2~ n ^ H ^^ s \ and the number of such eigenstates is 
at most 2 n( - H ^ +s '\ Thus, we have 



z 

= W2^( tl \U Tn M) M( P )^Ii Tn SM \t t ) 

< W2 25 ' n 2~ n ( H ( pS) ~ 63 ' > (44) 

where the last inequality follows from property (2) of typical subspaces, which gives a 
bound on the maximum eigenvalue of Ut u ^ M(p) N(p)® n TlT n S2 ■ Thus, if we make 
W = 2 n W' 5 )~ 2<5 - 5 3-< 5 ) ; we have the desired inequality (f42|), and the proof of Lemma 
|l] is complete. 

We used frequency-typical subspaces rather than entropy-typical subspaces in the 
proof of Lemma |]; this appears to be the most natural method of proof. Holevo 
|]20|| has found a more direct proof of Lemma |l|, which also uses frequency-typical 
subspaces. Frequency- typical sequences are commonly used in classical information 
theory, although they have not yet seen much use in quantum information theory, 
possibly because the quantum information community has not had much exposure 
to them. One can ask whether Lemma [l| still holds for entropy-typical subspaces. 
This is not only a natural question, but might also be a method of extending Lemma 
[l] to the case where supp(p) is a countable-dimension Hilbert space, a case where 
the method of frequency-typical subspaces does not apply. The difficulty with using 
entropy-typical subspaces in our current proof is that an eigenstate |s) of p® n which 
is entropy-typical but not frequency-typical will in general not be mapped to a mixed 
state AA(|s)(s|) having most of its mass close to the typical eigenspace of N{p)® n . 
This means that the Schumacher compression argument is no longer valid. One way 
to fix the problem is to require an extra condition on the eigenvectors of the typical 
subspace which implies that most of their mass is indeed mapped somewhere close 
to the typical eigenspace of N{p)® n . We have found such a condition (automatically 
satisfied by frequency- typical eigenvectors), and believe this may indeed be useful for 
studying the countable-dimensional case. 
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C. Proof of the Upper Bound 

We prove an upper bound of 



C E < max H(p) + H(N(p)) - H(M ® (45) 

where $ p is a purification of p. 

As in the proof of the lower bound, this proof works by first proving the result in 
a special case and then using this special case to obtain the general result. Here, 
the special case is when Alice's protocol is restricted to encode the signal using 
a unitary transformation of her half of the entangled state (ft. This special case 
is proved by analyzing the possible protocols, applying the capacity formula (M) of 



Holevo and Schumacher and Westmoreland |19, 31j|, and then applying several entropy 
inequalities. 

First, consider a channel M with entanglement-assisted capacity Ce- By the 
definition of entanglement-assisted capacity, for every e, there is a protocol that uses 
the channel M and some block length n, that achieves capacity Ce — e, and that does 
the following: 

Alice and Bob start by sharing a pure entangled state (ft, independent of the clas- 
sical data Alice wishes to send. (Protocols where they start with a mixed entangled 
state can easily be simulated by ones starting with a pure state, although possibly at 
the cost of additional entanglement.) Alice then performs some superoperator A x on 
her half of (ft to get (Ax <S>I)((ft), where A x depends on the classical data x she wants 
to send. She then sends her half of A x ((ft) through the channel ]\f® n formed by the 
tensor product of n uses of the channel M . Bob then possibly waits until he receives 
many of these states (N® n ®X)(A X ®T) ((ft), and applies some decoding procedure to 
them. 

This follows from the definition of entanglement-assisted capacity ([!]) using only 
forward communication. Without feedback from Bob to Alice, Alice can do no better 
than encode all her classical information at once, by applying a single classically- 
chosen completely positive map A x to her half of the entangled state (ft, and then 
send it to Bob through the noisy channel N® n . (If, on the contrary, feedback were 
allowed, it might be advantageous to use a protocol requiring several rounds of com- 
munication.) Note that the present formalism includes situations where Alice doesn't 
use the entangled state (ft at all, because the map A x can completely discard all the 
information in (ft. 

In this section, we assume that A x is a unitary transformation U x . Once we have 
derived an upper bound assuming that Alice's transformations are unitary, we will 
use this upper bound to show that allowing her to use non-unitary transformations 
does not help her. This is proved using the strong subadditivity property of von 
Neumann entropy; the proof (Lemma ^) will be deferred to the next section. 

The next step in our proof is to apply the Holevo formula, Eq. (|8|), to the tensor 
product channel J\f® n . Let J\f = J\[® n denote the tensor product of many uses of the 
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channel. For the xth. signal state, Alice sends her half of (U x tg)Z)(</>) through the 
channel M, and Bob receives (JV ®I)(JA X ®!){<j)). Bob's state can be divided into 
two parts. The first of these is his half of (ft, which, after Alice's part is traced out, is 
always in state Tr^cft). The second part is the state Alice sent through the channel, 
which, after Bob's part is traced out, is in state N(p x ) where p x = Tr b{^x{<P)) ■ Bob 
is trying to decode information from the output of many blocks, each containing n 
uses of the channel, together with his half of the associated entangled states, i.e., from 
many blocks of the form (J\ ®I)(U X ®I){(ft). Since these blocks are not entangled 



each other, the Holevo- Schumacher- Westmoreland theorem [19, £3lJ applies, and the 
capacity is given by formula (9), considering these blocks to be the signal states. The 
first term of formula @ is the entropy of the average block, and this is bounded by 

H(N(J2pxPx)) + H( Px ). (46) 

X 

The first term in (j46| ) is the entropy of the average state that Bob receives through 
the channel, i.e., J\ '(U x (Ti b4>)) > an d the second term is the entropy of the state that 
Bob retained all the time, i.e., Tr^. That the sum of the two terms is a bound for 
the entropy follows from the subadditivity property of von Neumann entropy that 
the entropy of a joint system is bounded from above by the sum of the entropies of 



the two systems [28]. We can use H(p x ) for the second term because Alice is using 
a unitary transformation to produce p x from her half of the entangled state (ft she 
shares with Bob, so the entropy H(p x ) = H(TiA(ft) is the same for all x. Since we 
assume that Alice and Bob share a pure quantum state, the entropy of Bob's half 
is the same as the entropy of Alice's half. Although this is not the most obvious 
expression for this second term of (|4q), it will facilitate later manipulations. 

The second term of formula (|8|) is the average entropy of the state Bob receives, 
and this is 



F 

X 



^p x H{{N® Z)(* P J) (47) 



where <& Px is a purification of p x . This formula holds because Alice's and Bob's joint 
state after Alice's unitary transformation A x is still a pure state, and so their joint 
state is a purification of p x . 
We thus get 

n(C E -e) < H (rt(£p xPx y\ +J2p x H(p x )-J2PxH(rt®l($ Px )). (48) 

\ X / X X 

However, by Lemma ||, that we prove in the next section, the last two terms in this 
formula are a concave function of p x , so we can move the sum inside these terms, and 
we get 

C E ~ e < - (H(N{p)) + H(p) - H{M !($„))) (49) 
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where 

X 

Finally, the expression (g) for Ce is additive (this will be discussed in the next 
section), so that 

C E {Nx ® M 2 ) = C E {Mx) + C E {M 2 ). (50) 

Using this, we can set n = 1 in Eq. (f49|), thus replacing N = J\f® n by A/". Since this 
equation holds for any e > 0, we obtain the desired formula (f45|). 



D. Proofs of the Lemmas 

This section discusses three lemmas needed for the previous section. The first of these 
shows that without loss of capacity, Alice can use a unitary transform for encoding. 
The next shows that the last two terms of the formula for Ce in Eq. @ are a convex 
function of p. The last lemma shows that the formula for Ce is additive. The 
first two lemmas use the property of strong subadditivity for von Neumann entropy. 
Originally, we also had a fairly complicated proof for the third lemma. However, Prof. 
Holevo has pointed out that a much simpler proof (also using strong subadditivity) 
was already in the literature, and so we will merely cite it. 

For the proofs of the first two lemmas in this section, we need the strong subad- 



ditivity property of von Neumann entropy [25, |28|| . This property says that if A, B, 
and C are quantum systems, then 

H(p AB ) + H{ PAC ) > H{ PABC ) + H(p A ). (51) 

It turns out to be a surprisingly strong property. 

We need to show that if Alice uses non-unitary transformations A x , then she can 
never do better than the upper bound Eq. ( |45| ) we derived by assuming that she uses 
only unitary transformations U x . Recall that any non- unitary transformation A x on 
a Hilbert space 7i iri can be performed by using a unitary transformation U x acting 
on the Hilbert space H. m augmented by an ancilla space Wane, and then tracing out 
the ancilla space |28|. We can assume that dimW anc < (dim?^) 2 . 

What we will do is take the channel N we were given, that acts on a Hilbert 
space TCi n and simulate it by a channel A/ 7 that acts on a Hilbert space TL m ® 7i an c 
where A/ 7 first traces out W anc and then applies ftf to the residual state on TL m . We 
can then perform any transformation S x by performing a unitary operation IA X on 
anc and tracing out 7Y anc - Since we proved the formula Eq. ( |45| ) for unitary 
transformations in the previous section, we can calculate Ce by applying this formula 
to the channel A/ 7 . What we show below is that the same formula applied to H gives 
a quantity at least as large. 

Lemma 2 Suppose that N and J\f' are related as described above. Let us define 

C = max H(p) + H{N{p)) - H{N X($ p )) (52) 

P&Hln 
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Figure 2: In Lemma A is the input space for the original map J\f . A U A' is the 
input space for the map A/ 7 . The output space for both maps is B. The space R is a 
reference system used to purify states in A and A' . 



and 



C' = max H{p')+H{N'{p))-H{N' ' ® !($,/)). (53) 

a ric 



77ien C > C". 

Proof: To avoid double subscripts in the following calculations, we now rename our 
Hilbert spaces as follows. Let A = TL in ; A 1 = 7i anc ; B = W out ; and E = Tt env . Let 
// maximize C in the above formula. We let p = Ti A ip ! . Since the channel N' 
was defined by first tracing out A' and then sending the resulting state through the 
channel J\f, p is the density matrix of the state input to the channel N in the protocol. 



Clearly, the middle terms in the above two formulae ( |52|) and (53) are equal, since 
N{p) = M'(p'). We need to show that inequality holds for the first and last terms in 
C and C"; that is, we need to show 

H {p)-H{(N® !)(<$> p )) > H(p')-H{(N' ® !)(<$> p ,)). (54) 

Recall, we have a noisy channel N that acts on Hilbert space A, and a channel 
A/ - ' that acts on Hilbert space A <g> A' by tracing out A' and then sending the resulting 
state through N . We need to give purifications & p and <Ey of p and p', respectively. 
Note that we can take <E>p = & p i, since any purification of p' is also a purification of 
p (see footnote 2). Let us take these purifications over a reference system Ti. re { that 
we call R. Consider the diagram in Figure ||. In this figure, pa = p, PAA' = p' and 
PAA'R = \^p){^p\ = l^p'X^Vl- Then N maps the space A to the space B and M' 
maps the space AA' to the space B by tracing out A' and performing J\f. 

We have H{p) = H{p A ) = H{p Am ), and H(p') = H{p AA >) = H{p R ). We also 
have H{{N®1)($ P )) = H{ PA , RB ) and H((N' ® I) ($,/)) = H{p RB ). 

Thus, 

C-C = H{p)-H{(N ® !){<$> P ))-H{p') + H{{N' ® X)($ P ')) 

= H{p Am )-H(p A , RB )-H{p B ) + H(p RB ) (55) 
> 
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R 

E 

Figure 3: For Lemma |3|, A is a Hilbert space we send through the channel jv, and B 
is the output space. This mapping J\f can be made unitary by adding an environment 
space E. We let R be a reference system which purifies the systems po and p\ in A, 
and C\ and C2 be two qubits purifying AR as described in the text. 

by strong subadditivity, and we have the desired inequality. 
For the next lemma, we need to prove that the function 

H{p)-H({N®1)(<5> P )) 

is concave in p. 

Lemma 3 Let po and pi be two density matrices, and let p = popo + P\p\ be their 
weighted average. Then 

H{p) -H{{N® !){<$> p )) > p (H(p )-H((Ar®l)(<£ po ))) 

+ Pl {H{ Pl ) -H{{N® !){<$> pi ))). (56) 

Proof: We again give a diagram; see Figure ^. Here we let the states be as follows: 
PA = P = P0P0 + P1P1, so A is in the state p. We let R be a reference system 
with which we purify the states po and pi- Consider purifications $0 = |</>o)(0o| and 
^1 = |0i)(0i| of po, pi, respectively. Then we have 

PAR =Po|0o><^>o| (57) 

We now let C\ and C2 be qubits which tell whether the system A is in state po or p\, 
and we will purify the system par in the system ARC\Ci in the following way: 

<t>AR Cl Ci = V»o>|0>|0> + VPil<M|l>|l>. (58) 
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Tracing out C2, we get that the state of ARC\ is 

Parc\ =Po|0o><0o| |0><0| +i>i|0i><0i| <S |1><1|, (59) 

so now C\ can be thought of as a classical bit telling which of $0 ° r ^1 is the state 
of the system AR. Note that we have the same expression after tracing out C 2 - 

Now, it's time for our analysis. We want to show equation ( |56| ) above. Notice 
that 

H(p) = H(p A ) = H( PRClC2 ), 
since pARdC 2 is m a pure state, and 

H{(N ®l){$ p )) = H{ PBRClC2 ). 

Now, suppose we have a classical bit C which tells whether a quantum system X is 
in state po or pi, with probability po and p\ respectively. The following formula gives 
the expectation of the entropy of X [34|] (this is analogous to the chain rule for 
the entropy of classical systems): 

E(px) = p H(p )+p l H{p 1 ) 

= H( Pxc )-H(p c ). (60) 

Using this formula (|60|), we see that 
1 

Y,PM(#®Z)& Pi )) = H( PBRCl ) - H(p Cl ) (61) 

j=0 

and 

1 

Y,Pi H (Pi) = H(p AC2 )-H( PC2 ) 
3=0 

= H{ PRCl ) - H{p C2 ). (62) 

Putting everything together, we get 

1 

H{p) -H({£T®1)($ P )) -Y,PJ (h(p,)-H((jV01)(^ P] ))) 

3=0 

= H(p RCl c 2 ) ~ H(pBRdC2) ~ H(p RCl ) + H{p BRCl ) (63) 

which is positive by strong subadditivity. To obtain (|6^), we used the equality 
H{PC\) = H(pc 2 ), which holds by symmetry. This concludes the proof of Lemma ||. 

The final lemma we need shows that we can set n = 1 and replace N = j\f® n by 
j\f in Eq. (|49|) . This follows from the fact that Ce is additive, that is, if Ce is taken 
to be defined by Eq. @, then 

C E (Ni ® M 2 ) = C E (Ni) + C e (jV 2 ). (64) 
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The > direction is easy. We originally had a rather unwieldy proof for the < direction 
based on explicitly expanding the formula for Ce and differentiating; However, A. 
Holevo has pointed out to us that a much simpler proof is given in fl2]| , so we will 
spare the readers our proof. 



III. Examples of Ce for Specific Channels 

In this section, we discuss the capacity of two specific channels: the first is the 
bosonic channel with attenuation/amplification and Gaussian noise, given a bound 
on the average signal energy, and the second is the qubit amplitude damping channel. 
Strictly speaking, we have not yet shown that the formula (^) holds for the Gaussian 
bosonic channel, as we have not proved that it holds either given an average energy 
constraint or for continuous channels. For channels with a linear constraint on the 
average density matrix p, our proof applies unchanged, and yields the result that 
the density matrix p of @ must be optimized over all density matrices satisfying 
this linear constraint. We make no claims as to having proven the formula (||) for 
continuous channels. In fact, we suspect that there may be continuous quantum 
channels which have a finite entanglement-assisted capacity, but where each of the 
terms of the formula @ is infinite for the optimal density matrix for signaling. The 
theory of entanglement-assisted capacity for continuous channels is thus currently 
incomplete. 

For the Gaussian channel with an average energy constraint, all three terms of 
@ must be finite, since any bosonic state with finite energy has a finite entropy. 
For this channel, (||) can be proven by approximating the channel with a sequence 
of finite-dimensional channels whose capacity we can show converges to the capacity 
of the Gaussian channel. We do this approximation by firstly restricting the input 
to the channel a finite subspace, and secondly projecting the output of the channel 
onto a finite subspace. (In these cases, the finite subspace can be taken to be that 
generated by the first k + 1 number basis states |n = 0), |n = 1), . . ., |n = k) defined 
later in this section.) 



A. Gaussian Channels 

The Gaussian channel is one of the most important continuous alphabet classical 
channels, and we briefly review it here. We describe the classical complex Gaussian 
channel, as this is most analogous to the quantum Gaussian channel. For a detailed 
discussion of this channel see an information theory text such as 14]. 

A classical complex Gaussian channel N of noise ./V is defined by the mapping in 
the complex plane 

N:z^z', z' ~G N {z' -z) , (65) 
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where the noise Gn is a Gaussian of mean and variance N, i.e., 



G N (z) = -\e-\*\ 2 l N . (66) 

7TiV 

Without any further conditions, the capacity of this channel would be unlimited, 
because we could choose an infinite subset of inputs arbitrarily far apart so that the 
corresponding outputs are distinguishable with arbitrarily small probability of error. 
We add an additional constraint on average input signal power or energy, say S. That 
is, we require that the input distribution W(x) satisfy 

\z\ 2 W(z)d 2 z < S. (67) 

This complex Gaussian channel is equivalent to two parallel real Gaussian channels. 
It follows that the capacity of the complex Gaussian channel with average input 
energy S and noise N is 

S' 



Cghan = log [ 1 + - 1 , (68) 

which is twice the capacity of a real Gaussian channel with average input energy S 
and noise N. 

Before we proceed to discuss the quantum Gaussian channel, let us first review 
some basic results from quantum optics. In the quantum theory of light, each mode 
of the electromagnetic field is treated as a quantum harmonic oscillator whose com- 
mutation relations are the same as those of SU(1, 1). A detailed treatment of these 



concepts is available in the book [33|. The Hilbert space corresponding to a mode is 
countably infinite. A countable orthonormal basis for this space is the number basis 
of states \n = j), j = 0, 1, 2, . . ., where the state \n = j) corresponds to j photons 
being present in the mode. 

Another useful basis is that of the coherent states of light. Coherent states are 
defined for complex numbers a as 

| a) = D(a)\0) (69) 

OO j 

e-H 2 /2^^_| n = i) (70) 



j=0 ^ 

where D{a) is the unitary displacement operator and |0) = \n = 0) is the vacuum 
state containing no photons. The complex number a corresponds to the complex field 
vector of a mode in the classical theory of light. If a = x + ip, then x is generally 
called the position coordinate and p the momentum coordinate. The displacement 
operator corresponds to displacing the complex number labeling the coherent state, 
and multiplying by an associated phase, i.e., 

D(a)\(3) = \a + p)e ilm ^ (71) 
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where Im takes the imaginary part of a complex number, i.e., Im(x + iy) = y. 

We also need thermal states, which are the equilibrium distribution of the har- 
monic oscillator for a fixed temperature. The thermal state with average energy S is 
the state 

T S = ^£(^)V=;><n=ii 

= J_ [ e -W/ S \ Z )(z\d 2 z. (72) 

VTD J 

The entropy of the thermal state Tg is 

g(S) = (S+1) log(S + 1) - S log(S) . (73) 

We are now ready to define the quantum analog of the classical Gaussian channel. 
(See for a much more detailed treatment of quantum Gaussian channels.) Co- 
herent states are an overcomplete basis, and a quantum channel may be defined by 
its action on coherent states. We restrict our discussion to quantum Gaussian chan- 
nels with one mode and no squeezing, which are those most analogous to classical 
Gaussian channels. These channels have an attenuation/amplification parameter k, 
and a noise parameter N. The channel amplifies the signal (necessarily introducing 
noise) if k > 1, and attenuates the signal if k < 1. Amplification/attenuation of 
the quantum state intuitively corresponds to multiplying the average position and 
momentum coordinates by the number k 2 . If this were possible for k > 1 without 
introducing any extra noise, it would enable one to violate the Heisenberg uncertainty 
principle and measure the position and momentum coordinates simultaneously to any 
degree of accuracy by first amplifying the signal and then simultaneously measuring 
these coordinates with optimal quantum uncertainty. To ensure that the channel 
is a completely positive map, amplification thus must necessarily entail introduce 
extra quantum noise. The channel M with noise N and attenuation/amplification 
parameter k acts on coherent states as 

M(\a)(a\) = D k 2 a T N D\ 2a for k < 1 

M(\a){a\) = D k 2 a T N+k ^ x D\z a for k > 1. (74) 



The entanglement-assisted capacity of Gaussian channels was calculated in |1J 
The density matrix p maximizing Ce is a thermal state of average energy S, and the 
entanglement-assisted capacity is given by 

C E = g(S) + g(S') - g( D + S '- S - 1 ) - g{ D ~ S ' + S - 1 ). (75) 
Here S is the average input energy; S' is the average output energy: 

S' = k 2 S + N for k < 1 

S' = k 2 S + N + k 2 -l for k > 1; (76) 
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Figure 4: This figure shows the curves given by the ratio of capacities Ce/Cshan for 
the quantum Gaussian channel with noise N and the nine combinations of values: 
amplification/attenuation parameter k = 0.1, 1, or 3; and signal strength S = 0.1, 1, 
or 10. The dotted curves have S = 0.1; the solid curves have 5 = 1; and the dashed 
curves have S = 10. Within each set, the curves have the values k = 0.1, k = 1, and 
k = 3 from bottom to top. 



and 

D = tJ{S + S' + 1) 2 -4k 2 S(S + 1). (77) 
The first term of fl75|), g(S), is the entropy of the input; the second term, g(S'), is 



the entropy of the output; and the remaining two terms of (75) are the entropy of a 
purification of the thermal state Tg after half of it has passed through the channel. 

The asymptotics of this formula are interesting. Let us hold the signal strength 
S fixed, and let the noise N go to infinity. Then, 

^fr (s+l)los K)- < 78 > 

which is independent of the attenuation/amplification parameter k. This ratio shows 
that the entanglement-assisted capacity can exceed the Shannon formula by an arbi- 
trarily large factor, albeit when the signal strength S is very small. We have plotted 
Cg/Cshan for some parameters in Figs. |] and ||. 

Possibly a better comparison than that of Ce to Cshan would be that of Ce to 
Cjji as Ch is the best rate known for sending classical information over a quantum 
channel without use of shared entanglement, However, the optimal set of signal states 
to maximize Ch for Gaussian channels is not known. For one-mode Gaussian channels 
with no squeezing, it is conjectured to be a thermal distribution of coherent states 
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Figure 5: The solid curves show the ratio of capacities Ce/Cghan for the quantum 
Gaussian channel with signal strength S, amplification/attenuation paramter k = 1 
and noise N = 0.1, 0.3, 1, 3, and 10 (from bottom to top). The dashed curve is the 
limit of the solid curves as N goes to oo; namely, Ce/Cshan = (S + 1) log(l + l/S). 
These curves approach oo as S goes to 0, and approach 1 as S goes to oo. 

HH; if this conjecture is correct, then Cjj < Cshan for these channels, so the ratio 
Ce/Cshan underestimates Ce/Ch] see Fig. ||. 

Some simple bounds on Ce for the quantum Gaussian channel can be obtained 
using the techniques of J?]]. Suppose that Alice takes a complex number a, encodes 
it as the state \a), and sends this through a quantum Gaussian channel. Bob then 
measures it in the coherent state basis. Here, the measurement step adds 1 to the 
noise, and this channel is thus equivalent to a classical Gaussian channel with average 
received signal strength k 2 S, and average noise ./V + 1 if k < 1, N + k 2 if k > 1. The 
quantum Gaussian channel must then have capacity greater than the capacity of 
this classical Gaussian channel. Conversely, Alice and Bob can simulate a quantum 
Gaussian channel by using a classical complex Gaussian channel: Alice measures her 
state (in the coherent state basis), sends the result through the classical channel, 
and Bob prepares a coherent state that depends on the signal he receives. If Alice 
starts with a state \a), when she measures it, she obtains a complex number a + e 
where e is a Gaussian with mean and variance 1. She can then multiply by k 2 to 
get k 2 a + k 2 e. To simulate the quantum Gaussian channel, she must send this state 
through a classical channel with noise N — k 2 if k < 1, and N — 1 if k > 1. This 
classical channel must then have classical capacity greater than Ce for the quantum 
Gaussian channel it is simulating. The arguments in this paragraph thus give bounds 
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0.2 0.4 s 0.6 0.8 ^1 

Figure 6: The values of the capacities Ce, Csham and the conjectured Ch (in units of 
bits) are plotted for the Gaussian channel with signal strength S, noise N = 1, and no 
amplification or attenuation {k = 1). As the curves approach 0, their leading-order 
behavior is as follows: Ch ~ S, Cshan ~ (log 2 e)5, and Ce ~ — hSlog 2 S, so the 
ratios C^/Cshan and Ce/Ch approach oo as S goes to 0. 



of 

i °4 l+ ^)- CE - loe ( 1+ w^ ) (79) 



for k > 1, and of 



^[l+TF7&Tr)<C E <log 1 + ^^-) (80) 



N/k 2 + 1J ~ \ N-l 

for k < 1. If we hold SyiV fixed, and let both these variables go to infinity, we 
find that these bounds all go to log(l + k 2 S/N), which corresponds to the classical 
Shannon bound (since the signal strength at the receiver is k 2 S). 

If k = 1, we can compute better bounds than these based on continuous- variable 
quantum teleportation and superdense coding. Alice and Bob can use a shared en- 
tangled squeezed state to teleport a continuous quantum variable [10|, and can also 



use such a state for a superdense coding protocol involving one channel use per shared 
state that increases the classical capacity of a quantum channel [11]. The squeezed 



state used, with squeezing parameter r > 0, is expressed in the number basis as 

1 



coshr . . 

j=0 



^2(t&nhr) j \n A = j)\n B = j), (81) 
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where ua and ng are the photon numbers in Alice's and Bob's modes, respectively. 
This state is squeezed, which means that it cannot be represented as a mixture 
of coherent states with positive coefficients. In this state, the uncertainty in the 
difference of Alice and Bob's position coordinates, xa — %B, is reduced, as is the 
uncertainty in the sum of their momentum coordinates pa + Pb- The conjugate 
variables, xa + xb andp^ — ps, have increased uncertainty. If Alice and Bob measure 
their position coordinates, the difference of these coordinates is a Gaussian variable 
with mean and variance e~ 2r , while the sum is a Gaussian with mean and variance 
e 2r . Similarly, if they measure their momentum coordinates, the sum has variance 
e~ 2r while the difference has variance e 2r . Further, if either Alice's or Bob's state is 
considered separately, it is a thermal state with average energy sinh 2 r. 

In continuous- variable teleportation jjiC|] , Alice holds a state \t) she wishes to send 
to Bob, and one half of the shared state \s r ). She measures the difference of position 
coordinates of these states, x m = xt — xa, and the sum of momentum coordinates, 
Pm = Pt + PA- These are commuting observables, and so can be simultaneously 
determined. She sends these measurement outcomes to Bob, who then displaces his 
half of the shared state using D(x m + ip m ). 

Using continuous-variable teleportation, Alice can simulate a quantum Gaussian 
channel with k = 1, average input energy S and noise N by sending the value x m +ip m 
over a classical complex Gaussian channel with average input energy S + (coshr) 2 
and noise N — e~ 2r . This gives a bound equal to the classical capacity of this channel: 



S + (coshr) 2 



^<log(l+ ,/T 2 , • (82) 



Finding the r which minimizes this expression gives 

(83) 



2r D X + l 



N 

where 



Di = yj {N + iy + ms (84) 

is the value of the variable D defined in Eq. ( |77| ) when we set k = 1. This gives the 
bound 



Similarly, if Alice uses superdense coding [11] to send a continuous variable to 



Bob, her protocol simulates a classical Gaussian channel. The average energy input 
to this channel is S — sinh 2 r and the noise is ./V + e~ 2r , so we obtain the bound 



S — sinh 2 r 



C E > log ( 1 + ^rr—^r ■ (86) 
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Maximizing this expression, we find the maximum is at e 2r = (D x - 1)/N, and the 
bound obtained is 

C^l+ S-l*-"-™** ). (87, 



Note that the bounds (p2|) and (pq) reduce to the bounds of (\^\) and (|^) when there 
is no entanglement in the squeezed state, i.e., when r = 0. 

B. The Amplitude Damping Channel 

The amplitude damping channel describes a qubit channel which sends states which 
decay by attenuation from |1) to |0), but which do not undergo any other noise. This 
channel can be described by two Krauss operators, 

A) - Mo f) 

where 

The maximization over p to find Ce can be reduced to an optimization over one 
parameter, as symmetry considerations show that p is of the form 



Px 



1-x 
x 



This makes the optimization numerically tractable, and the dependence of Ce on p 
is shown in Fig. [7|. As the damping probability p goes to one, we can analytically 
find the highest-order term in the expression for Ce, giving 

C E --x{l-p)\og{l-p) (88) 

for < x < 1. Here we use "~" to mean that the ratio of the two sides approaches 
1 asp goes to 1 . 

For the same channel, Ch can also be obtained by optimizing over a one-parameter 



family which uses two signal states p X)+ and p x - with equal probability! 16]. These 
signal states are 

( 1 ~ x ±\/z(l - x) \ / Qm 

P ^ ±= {±VMT^x) x )■ (89) 

As p goes to one, again we can analytically find the highest-order term for Ch, which 
is 

Ctf«-jc(l-ar)(l-p)log(l-p). (90) 
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Figure 7: (a) The capacity functions Ce and Ch for the amplitude damping channel 
are plotted against the damping probability p. (b) The ratio Ce/Ch is plotted. This 
curve is so steep near p = 1 that for p = 1 — 10~ 50 , the computed value of the ratio 
Ce/Ch was only 3.8; the limiting value of 4 for p = 1 was derived analytically. 

Thus, as p goes to 1, the values of x maximizing Ce and Ch respectively approach 
1 and 1/2, and the ratio Ce/Ch approaches four. These functions are shown graph- 
ically in Fig. [5]. In our previous paper |7j, we showed that for the qubit depolarizing 
channel, the ratio Ce/Ch approached 3 as the depolarizing probability approached 
1, and for the d-dimensional depolarizing channel, the ratio approached d+1. We do 
not know whether this ratio is bounded for finite-dimensional channels, although we 
suspect it to be. If so, then the interesting question arises of how this bound depends 
on the dimensions dimWi n and dimW ou t^. 

IV. Classical Reverse Shannon Theorem 

Shannon's celebrated noisy channel coding theorem established the ability of noisy 
channels to simulate noiseless ones, and allowed a noisy channel's capacity to be 
defined as the asymptotic efficiency of this simulation. The reverse problem, of using 
a noiseless channel to simulate a noisy one, has received far less attention, perhaps 
because noisy channels are not thought to be a useful resource in themselves (for 
the same reason, there has been little interest in the reverse technology of water 
desalination — efficiently making salty water from fresh water and salt). We show, 

4 A. Holevo has found a qubit channel where this ratio is 5.0798 |E0| 
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perhaps unsurprisingly, that any noisy discrete memoryless channel of capacity C can 
be asymptotically simulated by C bits of noiseless forward communication from sender 
to receiver, given a source R of random information shared beforehand between sender 
and receiver. If this were not the case, characterization of the asymptotic properties 
of classical channels would require more than one parameter, because there would be 
cases where two channels of equal capacity could not simulate one another with unit 
asymptotic efficiency In terms of the desalination analogy, water from two different 
oceans might produce equal yields of fresh drinking water, yet still not be equivalent 
because they produced unequal yields of partly saline water suitable, say, for car 
washing. 

Although it is of some intrinsic interest as a result in classical information theory, 
we view the classical reverse Shannon theorem mainly as a heuristic aid in developing 
techniques that may eventually establish its quantum analog, namely the conjectured 
ability of all quantum channels of equal Ce to simulate one another with unit asymp- 
totic efficiency in the presense of shared entanglement. 

Here we show that any classical discrete memoryless channel N, of capacity C, 
can be asymptotically simulated by C uses of a noiseless binary channel, together 
with a supply of prior random information R shared between sender and receiver. 

The channel TV" is defined by its stochastic transition matrix N yx between inputs 
x £ {1. ..di} and outputs y £ {l...do}- Let N n denote the extended channel consisting 
of n parallel applications of T, and mapping x £ {l...dj} to y £ {l...dJ3j}. 

Theorem 2 (Classical Reverse Shannon Theorem) Let N be a DMC with Shan- 
non capacity C and e a positive constant. Then for each block size n there is a 
deterministic simulation protocol S n for N n which makes use of a noiseless forward 
classical channel and prior random information (without loss of generality a Bernoulli 
sequence R) shared between sender and receiver. When R is chosen randomly, the 
number of bits of forward communication used by the protocol S n on channel input 
x £ {l...dj} n is a random variable; let it be denoted m n {x). The simulation is exactly 
faithful in the sense that for all n the stochastic matrix for S n , when R is chosen 
randomly, is identical to that for N n , 

' 1 nxy{.Sn)yx — \N )yxi (91) 

and it is asymptotically efficient in the sense that the probability that the protocol uses 
more than n(C + e) bits of forward communication approaches zero in the limit of 
large n, 

lim max P(mJx) > n(C + e)) = 0. (92) 

Note that the notion of simulation used here is stronger than the conventional 
one used in the forward version of Shannon's noisy channel coding theorem, and in 
eq. (|j) defining the generalized capacity of one quantum channel to simulate another. 
There the simulations are required only to be asymptotically faithful and their cost 
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m is deterministically upper bounded by n(C + e). By contrast our simulations are 
exactly faithful for all n and their cost is upper bounded by n(C + e) only with 
probability approaching 1 in the limit of large n, for all e > 0. To convert one of 
our simulations into a standard one, it suffices to discontinue the simulation and 
substitute an arbitrary output whenever m n (x) is about to exceed n{C + e). 

To illustrate the central idea of the simulation, we prove the theorem first for a 
binary symmetric channel (BSC), then extend the proof to a general discrete memo 
ryless channel. Let N be a binary symmetric channel of crossover probability p. Its 
capacity C is 1 — i?2(p) = 1 + pl°g 2 p + (1 ~~ P) l°g2(l ~~ P)- To prove the theorem in 
this case it suffices to show that for any rate e > 0, there is a sequence of simulation 
protocols S n such that 

^nxy{Sn)yx = )yxi (93) 

and 

lim max P(m n (x) > n(C + e)) = 0. (94) 
n-voo x6 {i...rf J }n 

The simulation protocol S n is as follows: 

1. Before receiving the input x £ {0, 1}™, Alice and Bob use the random informa- 
tion R to choose a random set Z(R,n) of 2 n ^ c+e / 2 ^ n-bit strings. [We use e/2, 
rather than e, to keep the total overhead, including other costs, below e]. 

2. Alice receives the n-bit input x. 

3. Alice simulates the true channel N n within her laboratory, obtaining an n- 
bit "provisional output" y. Although this y is distributed with the correct 
probability for the channel output, she tries to avoid transmitting y to Bob, 
because doing so would require n bits of forward communication, and she wishes 
to simulate the channel accurately while using less forward communication. 
Instead, where possible, she substitutes a member of the preagreed set Z(R, n), 
as we shall now describe. 

4. Alice computes the Hamming distance, d = \x—y\ between x and y. 

5. Alice determines whether there are any strings in the preagreed set Z(R, n) 
having the same Hamming distance d from x as y does. If so, she selects a 
random one of them, call it y', and sends Bob Oi, where i is the approximately 
n(C + e/2)-bit index of y' within the set Z(R,n). If not, she sends Bob the 
string ly, the original unmodified n-bit string y, prefixed by a 1. 

6. Bob emits y 1 or y, whichever he has received, as the final output of the simula- 
tion. 

It can readily be seen that the probability of failure in step 5 — i.e., of there 
being no string of the correct Hamming distance in the preagreed set Z(R, n) — 
decreases exponentially with n as long as e > 0. Thus the probability of needing to 
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use more than C(l + e) bits of forward communication approaches zero as required 
by Eq. (p4|). On the other hand, regardless of whether step 5 succeeds or fails, 
the final output is correctly distributed (satisfying Eq. (|94|)) since it has the correct 
distribution of Hamming distances from the input x, and, for each Hamming distance, 
is equidistributed among all strings at that Hamming distance from x. The theorem 
follows. 

For a general discrete memoryless channel the protocol must be modified to take 
account of the nonbinary input and output alphabets, and the fact that the output 
entropy may be different for different inputs, unlike the BSC case. The notion of 
Hamming distance also needs to be generalized. The new protocol uses the notion 
of type class Two n-character strings belong to the same type class if they 
have equal letter frequencies (for example four a's, three b's, twelve c's etc.), and 
are therefore equivalent under some permutation of letter positions. We will consider 
input type classes (ITCs) and joint input/output type classes (JTC), the latter being 
defined as a set of input/output pairs (x,y) equivalent under some common permu- 
tation of the input and output letter positions. In other words, (xi,yi) and (2:2,2/2) 
belong to the same JTC if and only if there exists a permutation of letter positions, 
7T, such that 7r(xi) = X2 and ir(yi) = 1/2. Evidently, for any given input and output 
alphabet size, the number of ITCs, and the number of JTC are each polynomial in 
n. Let k = l,2...K n index the ITCs, and £ = l,2...L n the JTC for inputs of length 
n. The JTC will be our generalization of the Hamming distance, since the transition 
probability (N n ) yx is equal for all pairs (x,y) in a given JTC. The new protocol 
follows: 

1. Before receiving the input x £ {1, d 1 }}, Alice and Bob use the common random 
information R to preagree on K n random sets {Z(R,n,k) : k = l...K n } of 
n-letter output strings, one for each ITC. The set Z(R,n,k) has cardinality 
2 n ( Ck+t l 2 \ where Ck < C is the channel's capacity for inputs in the fc'th ITC 
(in other words, 1/n times the channel's input:output mutual information on 
n-letter inputs uniformly distributed over the fe'th ITC). In contrast to the 
BSC case, where the members of Z(R, n) were chosen randomly from a uniform 
distribution on the output space, the elements of Z(R, n, k) are chosen randomly 
from the (in general nonuniform) output distribution induced by a uniform 
distribution of channel inputs over the /c'th ITC. 

2. Alice receives the n-letter input x, determines which ITC, k, it belongs to, and 
sends k to Bob, using o(n) bits to do so. 

3. Alice simulates the true channel N n in her laboratory, obtaining an n-letter 
provisional output string y. Although this y is distributed with the correct 
probability for the channel input x, she tries to avoid transmitting y to Bob, 
because to do so would require too much forward communication. Instead she 
proceeds as described below. 
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4. Alice computes the index £ of the JTC to which the input/output pair (x,y) 
belongs. As noted above, this JTC index is the generalization of the Hamming 
distance, which we used in the BSC case. 

5. Alice determines whether there are any output strings in the preagreed set 
Z(R, n, k) having the same JTC index relative to x as y does. If so, she selects 
a random one of them, call it y', and sends Bob the string Oi where i is the 
approximately n(C + e/2)-bit index of y' within the set Z(R,n,k). If not, she 
sends Bob the string ly. 

6. Bob emits y' or y, whichever he has received, as the final output of the simula- 
tion. 

This protocol deals with the problem of dependence of output entropy on input 
by encoding each ITC separately. Within any one ITC, the output entropy is inde- 
pendent of the input. The communication cost of telling Bob in which ITC the input 
lies is polylogarithmic in n, and so asymptotically negligible compared to n. Because 
one cannot increase the capacity of a channel by restricting its input, nC is an upper 
bound the input:output mutual information nCk for inputs restricted to a particular 
ITC. Moreover, for any ITC k and any input x in that ITC, the input:output pairs 
generated by the true channel T n , will be narrowly concentrated, for large n, on JTC 
whose transition frequencies approximate (to within 0(y/n)) their asymptotic values. 
Therefore, as before, for any e > 0, the probability of failure in step 5 will decrease 
exponentially with n. And as before, the simulated transition probability {S n ) yx on 
each ITC is exactly correct even for finite n. The reverse Shannon theorem for a 
general DMC follows, as does the following corollary. 

Corollary 1 (Efficient simulation of one noisy channel by another) 

In the presence of shared random information between sender and receiver, any two 
classical channels of equal capacity can simulate one another, in the sense of eq. fy), 
with unit asymptotic efficiency. 

From the proof of the main theorem it can also be seen that when inputs to the 
noisy channel being simulated come from a source having a frequency distribution q 
differing from the optimal one p for which capacity C is attained, then the asymptotic 
cost of simulating the channel on that source is correspondingly less. 

Corollary 2 (Efficient simulation of noisy channels on constrained sources) 

Let N be a DMC, q be a probability distribution over the source alphabet, and I(N, q) 
be the channel's constrained capacity, equal to the single-letter input: output mutual 
information on source q. Then, in the presence of shared random information R be- 
tween sender and receiver, the action of N on any extended source having q for each 
of its marginal distributions can be simulated in the manner of Theorem with per- 
fect fidelity and a forward noiseless communication cost asymptotically approaching 
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I(N,q): viz. \/ e \\m. n ^ 00 P(m n > n(I(N,q) + e)) = 0. Here m n denotes the number of 
bits of forward communication used by the protocol when R is chosen randomly with 
a uniform distribution and inputs are chosen randomly according to the constrained 
extended source. 

V. Discussion — Quantum Reverse Shannon Conjecture 

We conjecture (QRSC) that in the presence of unlimited shared entanglement between 
sender and receiver, all quantum channels of equal Ce can simulate one another with 
unit asymptotic efficiency, in the sense of eq. By the results of the previous 

section, the conjecture holds for classical channels (where the shared random infor- 
mation required for the classical reverse Shannon theorem is obtained from shared 
entanglement). In our previous paper j/]] we showed that the QRSC also holds for 
another class of channels, the so-called Bell-diagonal channels, which commute with 
teleportation and superdense coding. For these channels, the single-use entanglement- 
assisted classical capacity of the channel via superdense coding is equal to the forward 
classical communication cost of simulating it via teleportation. The QRSC asserts 
this equality holds asymptotically for all quantum channels, even when (as for the 
amplitude damping channel) it is does not hold for single uses of the channel. We 
hope that the arguments used to prove the classical reverse Shannon theorem can be 
extended to demonstrate its quantum analog. 

If the QRSC is true, one useful corollary would be the inability of a classical 
feedback channel from Bob to Alice to increase Ce- A causality argument shows 
that a feedback channel cannot increase Ce for noiseless quantum channels. If we 
could simulate noisy quantum channels by noiseless ones, this would imply that if a 
feedback channel increased Ce for any noisy channel, it would have to increase Ce 
for noiseless ones as well, violating causality. 

We thank Igor Devetak, David DiVincenzo, Alexander Holevo, Michael Nielsen 
and Barbara Terhal for helpful discussions, and the referees for careful reading and 
advice resulting in significant improvements. 
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